A multimodal learning interface for word acquisition
نویسندگان
چکیده
We present a multimodal interface that learns words from natural interactions with users. The system can be trained in unsupervised mode in which users perform everyday tasks while providing natural language descriptions of their behaviors. We collect acoustic signals in concert with user-centric multisensory information from non-speech modalities, such as user’s perspective video, gaze positions, head directions and hand movements. A multimodal learning algorithm is developed that firstly spots words from continuous speech and then associates action verbs and object names with their grounded meanings. The central idea is to make use of non-speech contextual information to facilitate word spotting, and utilize temporal correlations of data from different modalities to build hypothesized lexical items. From those items, an EM-based method selects correct word-meaning pairs. Successful learning has been demonstrated in the experiment of the natural task of “stapling papers”.
منابع مشابه
Towards Understanding Child Language Acquisition: An Unsupervised Multimodal Neural Network Approach
This paper presents an unsupervised, multimodal, neural network model of early child language acquisition that takes into account the child’s communicative intentions as well as the multimodal nature of language. The model exhibits aspects of one-word child language such as generalisation to new and unforeseen utterances, a U-shaped learning trajectory and a vocabulary spurt. A probabilistic ga...
متن کاملThe Role of Repeated Exposure to Multimodal Input in Incidental Acquisition of Foreign Language Vocabulary
Prior research has reported incidental vocabulary acquisition with complete beginners in a foreign language (FL), within 8 exposures to auditory and written FL word forms presented with a picture depicting their meaning. However, important questions remain about whether acquisition occurs with fewer exposures to FL words in a multimodal situation and whether there is a repeated exposure effect....
متن کاملA Localist Neural Network Model for Early Child Language Acquistion from Motherese
This paper presents a localist multimodal neural network that uses Hebbian learning to acquire one-word child language from child directed speech (CDS) comprising multiword utterances and queries in addition to one-word utterances. The model implements cross-situational learning between linguistic words used in child directed speech, the accompanying perceptual entities, conceptual relations an...
متن کاملProsodic Features from Large Corpora of Child-Directed Speech as Predictors of the Age of Acquisition of Words
The impressive ability of children to acquire language is a widely studied phenomenon, and the factors influencing the pace and patterns of word learning remains a subject of active research. Although many models predicting the age of acquisition of words have been proposed, little emphasis has been directed to the raw input children achieve. In this work we present a comparatively large-scale ...
متن کاملA Computational Model for Taxonomy-Based Word Learning Inspired by Infant Developmental Word Acquisition
To develop human interfaces such as home information equipment, highly capable word learning ability is required. In particular, in order to realize user-customized and situation-dependent interaction using language, a function is needed that can build new categories online in response to presented objects for an advanced human interface. However, at present, there are few basic studies focusin...
متن کامل